FAST VISSIM ON IMPACT GRAPHICS

by Paul Hansen, ISD Graphics Software

This article describes work done on an example program implementing geo-specific terrain-following, courtesy of Patrick Bouchaud (Cortaillod), with program enhancements by the author.
Optimizing VISSIM applications for the Impact graphics platform is an exciting challenge, similar to the challenge of developing mid-range graphics: to provide the fullest range of features and the highest performance possible within severe constraints of cost and space. With Impact graphics texture-mapping, a fast and powerful system is available to application developers, but special care must be taken to effectively utilize this speed and power.
The task of providing platform-specific optimizations for VISSIM is best left to libraries like the Iris Performer. In fact, work is currently under way to incorporate this information into Performer. This article is meant to be generally informative, especially for those who do write in OpenGL.
On an Impact graphics system with hardware texture-mapping (High or Maximum Impact) the amount of "bulk" texture memory is either 1 Mb or 4Mb (with the TRAM option card), a substantial amount, but hardly enough for enthusiastic users of texture. Furthermore, the way textures are stored in the TRAM chips sometimes does not fully utilize the memory. TRAM memory consists of 256 "pages"; if an image uses less than a full page, the rest of the page can't be used for other images. The exception to this is the so-called "packed page"; the smallest five mipmap levels of a texture can share a single page. But even with this packing some memory will be unusable. (For those interested in the exact calculation of TRAM page requirements, a utility called "tpage" is available at the Web address given at the end of this article).
Some applications will reduce the size and number of textures and their component depth in order to keep their full set of textures always resident in texture memory. However, since Impact graphics does have a fast download rate it is possible to tolerate a certain amount of texture downloading on a per-frame basis and still have an acceptable frame rate.
There are three ways to download image data into the TRAM chip(s): one, "load" the data with glTexImage() or glTexSubImage() calls; two, "copy" the data from a readsource with glCopyTexImage() or glCopyTexSubImage(); and three, to provoke (or rely upon) the system "texture manager" software to restore the data from its own storage in CPU memory. In the last option, one uses the glBindTexture() to define the full set of textures; when the texture memory becomes over-subscribed, data is saved into host memory and restored whenever that texture is bound and used to texture-map an object. This is the method used in the example program.
Advantages to using the system save/restore mechanism:

data is saved into a special "pool" of host memory which is always "locked"; this allows the restore to bypass the kernel routine which locks the memory, saving time.

the system can go directly to the restore routine and use a highly optimized path, without having to check application-supplied arguments and check for "pixel-path" operations, such as scale-and-bias.

when the system restores the packed-page (see above), five small mipmap levels are downloaded in a single operation, saving the overhead of five separate calls to glTexSubImage().

the method is simple and convenient; simply bind and draw.

the restore mechanism responds to the "texture-LOD" extension, which allows the application to declare a sub-range of mipmap levels. The system will restore only the levels in the range. This is effective applications-control of image downloading.

the method can automatically take advantage of any future enhancements and optimizations in the TRAM management system software.
The original version of the program features Patrick Bouchaud's fast culling algorithm. It computes the intersection of the viewing frustum and the terrain volume (latitude-longitude-altitude). For each latitude at a particular stride in the terrain, a minimum and maximum longitude can quickly be found. The latitude stride and longitude stride are constants for any frame, so z-clipping is used to eliminate distant triangles that have a very small screen size. This keeps rendering from becoming "geometry-limited", which would drastically reduce the frame rate. Z-clipping is standard for VISSIM, and fog is used to obscure the appearance and disappearance of terrain at the z-clipping plane.
The program originally used an 8-bit luminance texture, non-mipmapped. Image size was 1024x1024, a filtered version of a 6000x6000 image of a mountainous region in France. This size and type allowed the texture to be fully resident, without tiling, in a 1-TRAM system. A luminance texture is certainly not as gratifying as full color, but is a good example for applications that do use 1-channel data. Also, it can take advantage of the "texture-select" extension, discussed below.

Program enhancements by the author

1. Full resolution with tiling.
For luminance data, the full resolution amounts to 36 Mb when the image is placed into a 6144x6144 array. To deal with this size image, a tiling algorithm was developed which can be viewed in the program source code and showcase documentation on the DT. Increasingly, applications will need to efficiently handle large images; on high-end systems tiling is rarely necessary; on Impact systems it will become commonplace. Application software will experience a paradigmatic difference in dealing with these two types of platforms.
There are two basic ways to tile textures seamlessly in OpenGL: one, with border data, which is the way it was designed to be done, and two, by "overlapping" tiles so that neighboring tiles share edge texels. The latter method uses fewer TRAM pages, but has some important drawbacks; it requires special handling of texture coordinates based on the size of the tiles, and the overlapping edge messes up power-of-two sizes, with the unfortunate side effect that mipmapping cannot be done properly. For these reasons the second method was rejected for the example program.
It's interesting to note that on high-end graphics, the Reality Engine doesn't support border data (because it was designed before the OpenGL) and the Infinite Reality has deferred support of border data to a later software release (post-6.2MR). The man page for glTexImage2D() recommends that applications use the new clamp-to-edge extension which is similar to edge clamping in the IrisGL. However, this method will produce artifacts at tile boundaries, which will be more noticeable when the texture is magnified. Another problem is that the extension will never be supported on Impact graphics. The best solution for tiling applications that are to run on both platforms may be to use border data and target a future software release (for IR), or detect the platform and code for both.
In the example program, the visible area of any tile is completely drawn before going on to the next tile (see figure 1). The central area of the tile is drawn with triangle strips, as if there were no tiling. Around the central area is a seam of triangle fans, drawn to the tile edge (see figure 2). One might like to store the interpolated vertexes along the tile edge, since the same interpolations are done over and over. However, the program seems to run well without this and the host is probably running well ahead of graphics anyway. The same logic applies to the texture coordinates; it's very convenient to use texgen to have the texture coordinates generated in microcode, but the amount of double precision math already involved in processing the textured triangles means that the operation may be better balanced if the host also provides the texture coordinates.

figure 1. figure 2.

2. Mipmapping.
One is tempted to say "forget mipmapping" when the level zero image alone is 36 Mb plus border storage. Without mipmapping there is heavy aliasing, producing a sparkling appearance that is quite noticeably bad. But with Impact graphics, there's another very good reason to want mipmapping: to get a faster fill rate. The TRAM chip contains a high-speed cache that requires sufficient "look-ahead" to avoid delays in getting the correct data. The look-ahead is based on spatial coherence; when a texture is severely minified, the samples skip around in the image, losing coherence. Even with mipmapping, this effect is present to a small degree; but without mipmapping, fill rates can drop by as much as 90 per cent!
Applications that use mipmapped textures often read the base level off disk, and use a library utility to compute the mipmaps and call glTexImage2D(). Since the example program is tiling, this is not an option. The utility doesn't know about tiling, and for the above reasons doesn't know about border data. And even if it did, the fact that border data is being used means that the the mipmaps would best be created for the whole image, and not individual tiles. This would require the entire image to be brought into user memory; even with an in-place algorithm, once the tiles start getting downloaded, the TRAM(s) will quickly become filled, and the system will start using host memory to save the overflow. Eventually the system will have two copies of the data, plus one copy of the mipmaps. To avoid this, and save processing time, the image was pre-processed; the mipmaps were calculated and the tiles (with border data) extracted and dumped into a binary file, affectionately called "cooked.bin". Also, groups of four tiles were interlaced and stored as a single 4-channel tile to take advantage of the "texture-select" extension on 4-TRAM systems, discussed below.
The tile size used is 256, so there are 576 tiles; times 9 mipmap levels, means that, for 1-TRAM systems, glTexImage2D() is called 5184 times. The total space required for the save/restore mechanism is 76.5 Mb, so the example program definitely needs enough extra RAM to keep from disk thrashing. Without border data the space requirement is 58.5 Mb, about a 30% overhead. The percentage overhead would be reduced if larger tiles were used.
3. LOD (Level-Of-Detail) control.
On the 6.2 release software, one can take advantage of the new texture-LOD extension in the OpenGL. This is by far the most important feature for optimizing VISSIM on Impact graphics. Without it, there's no way for the application to keep the system from downloading hordes of data that never get used in a frame, when a textured object is off in the distance, using only small mipmap levels. The original intent of this extension, as indicated in the spec, was to facilitate applications-control of downloading at texture definition time; i.e., the mipmap levels outside the desired range don't even have to be defined in order for the range to qualify as a valid texture. When the user moves closer to the object and larger levels are needed, new levels can be defined and included in the range. This mode does save texture memory, and works correctly on Impact graphics, but a higher performance result can be obtained by defining all the levels at startup, and using the extension to control system restores.
Of course, the application needs to derive an LOD value for the polygon or group of polygons to be rendered, to determine the range of mipmap levels needed. Some applications and libraries already have LOD values for objects to control geometry since fewer polygons need to be drawn if the object is far away. So it's a relatively small step to relate this LOD to the size of the texture being used, and then set the range limits.
LOD is a function of the size of the texture and, along with the perspective view parameters, the eye distance to the object. On a per-polygon basis, LOD for texture-mapping is also a function of the "tilt" of the polygon, or the angle between the view vector and a vector normal to the polygon. When the geometry is tilted, smaller mipmap levels are used by the system. Imagine flying over flat terrain; typically the geometry does have a significant tilt, and an application that considers only eye-distance can be wasting some downloads. Of course, terrain is not always flat, and in this case polygons that would use a higher resolution mipmap will get clamped to the smaller levels of the specified range. This explains why, in the example program, if one looks down on the terrain the view is more "crisp" than in a fly-over mode.
In the program, LOD is calculated by taking a ratio of the distance to a vertex over the distance at which a perpendicular view shows the image at full size (resolution) on the screen. This ratio is modified by the cosine of the view angle away from straight down. Finally, one needs to take the base2 logarithm of this value; as a quick hack the exponent of the floating point number is masked and shifted to produce an integer. This process is actually done separably in latitude and longitude and the maximum of these is taken as the LOD, to conform to the system algorithm for calculating LOD in the texture-engine chip. The vertex-of-interest is taken as the central corner of a group of four tiles that are part of a "quad-selectable", using the texture-select extension (see below). To be more correct, the tile corner closest to the eye should be used, and this accounts for a certain amount of "resolution popping" during fly-overs. Also, the transition points from one LOD to another can be more carefully adjusted to reduce popping.
Once the integer LOD value is obtained and clamped to the maximum possible range, the glTexParameter() call is used to set the subrange. In the program, the LOD integer is used as the "base level" and the "max level" is set to include two smaller levels. When the view is close to straight down, perhaps only two levels are needed, and when close enough to be magnified, only one level (zero); so, additional optimizations are still possible. On Impact graphics, when the first "packed" level is included (see above), one might as well include all the smaller levels, since they will be restored together.
The integer settings of base level and max level are not the whole story; the texture-LOD extension also has two other settings, floating point values for "minlod" and "maxlod". These settings are to clamp the system calculations of LOD within the floating point range of those values, with the understanding that the integer range of levels includes or brackets this range. The purpose of this is to artificially maintain a range of smaller levels for performance reasons, for instance when moving, and then gradually relax the constraints over a few frames when motion stops. On Impact graphics the maxlod setting has no effect, since there is no functionality for this in hardware, and the minlod setting has a hardware bug that causes clamping to fail when the texture is magnified (less than 2x), and the minlod is set to greater than zero and less than one. Unfortunately this is a very common range for use. All integer settings and floating point settings greater than one work perfectly. In the example program only integer settings corresponding to the base level were used.
4. Texture-select, for luminance data and 4-TRAM systems.
This extension to the OpenGL provides a way to focus on a subset of components of a texture and view that subset as if it were a complete texture. For example, a 4-channel image can be viewed as four luminance textures or two luminance- alpha textures, and a 2-channel texture can be viewed as 2 luminance textures. The "selectable" is defined with special internal format and loaded (and saved and restored) as a unit. The glTexParameter() call is used to select a subset for drawing.
There are additional benefits on different systems. On Reality Engine and Infinite Reality small components can be packed into texture memory more efficiently. Generally, the GL_LUMINANCE4_EXT internal format has no corresponding 4-bit external type for disk storage. However, using this extension four of these textures can be interlaced and stored as 16-bit values with the GL_UNSIGNED_SHORT_4_4_4_4_EXT external type. In this way, on Impact with 4 TRAMs, four 2048x1024 non-mipmapped luminance textures can be simultaneously resident in texture memory. Also on Impact, loads and restores can be faster, since there is less overhead and better usage of bus bandwith.
In the example program, the image tiles, with border data, were interlaced and placed into the binary data file "cooked.bin" as selectable textures. The problem was to make this work for 1-TRAM systems, which don't support the texture-select extension. First a proxy texture was defined with the internal format set to GL_QUAD_LUMINANCE8_SGIS. When the glGetError() is checked, "no error" allows us to read binary data out of the file and go directly to glTexImage2D(); in this case the program starts up in about 15 seconds. If there is an error, then the data has to be de-interlaced and the GL_LUMINANCE8_EXT internal format is used. The method used in the program to de-interlace is to declare the external type as GL_UNSIGNED_INT and let the system convert ints to bytes during the load. Using the glPixelStore() command, the data is effectively "shifted" to ensure that the valid byte is the high byte in the int. In this case the program starts up in a little over a minute.
CONCLUSION
Working on this program has been great fun and a good test of Impact system software, especially the new extensions. The program runs well and readers with an Impact system with enough RAM are encouraged to download it from the Developer Toolbox. On a system with 4 TRAMs, about 10-12 frames per second can be seen, and further optimization is still possible.
The program and other example programs in different application areas, as well as extensive documentation, can be viewed and obtained through the Developer Toolbox, including the Developer Toolbox Web page:
https://www.sgi.com/toolbox/src/tutorials/OGLT/
Enjoy!